On Reinforcement Learning of Control Actions in Noisy and Non-Markovian Domains
نویسنده
چکیده
If reinforcement learning (RL) techniques are to be used for \real world" dynamic system control, the problems of noise and plant disturbance will have to be addressed. This study investigates the e ects of noise/disturbance on ve di erent RL algorithms: Watkins' Q-Learning (QL); Barto, Sutton and Anderson's Adaptive Heuristic Critic (AHC); Sammut and Law's modern variant of Michie and Chamber's BOXES algorithm; and two new algorithms developed during the course of this study. Both these new algorithms are conceptually related to QL; both algorithms, called P-Trace and Q-Trace respectively, provide for substantially faster learning than straight QL overall, and for dramatically faster learning (by up to a factor of 200) in the special case of learning in a noisy environment for the dynamic system studied here (a pole-and-cart simulation). As well as speeding learning, both the P-Trace and Q-Trace algorithms have been designed to preserve the \convergence with probability 1" formal properties of standard QL, i.e. that they be provably \correct" algorithms for Markovian domains for the same conditions that QL is guaranteed to be correct. We present both arguments and experimental evidence that \trace" methods may prove to be both faster and more powerful in general than TD (Temporal Di erence) methods. The potential performance improvements using trace over pure TD methods may turn out to be particularly important when learning is to occur in noisy or stochastic environments, and in the case where the domain is not well-modelled by Markovian processes. A surprising result to emerge from this study is evidence for hitherto unsuspected chaotic behaviour with respect to learning rates exhibited by the well-studied AHC algorithm. The e ect becomes more pronounced as noise increases.
منابع مشابه
Description and Acquirement of Macro-Actions in Reinforcement Learning
Reinforcement learning is a framing of enabling agents to learn from interaction with environments. It has focused generally on Markov decision process (MDP) domains, but a domain may be non-Markovian in the real world. In this paper, we develop a new description of macro-actions for non-Markov decision process (NMDP) domains in reinforcement learning. A macro-action is an action control struct...
متن کاملDissertation an Echo State Model of Non-markovian Reinforcement Learning
OF DISSERTATION AN ECHO STATE MODEL OF NON-MARKOVIAN REINFORCEMENT LEARNING There exists a growing need for intelligent, autonomous control strategies that operate in real-world domains. Theoretically the state-action space must exhibit the Markov property in order for reinforcement learning to be applicable. Empirical evidence, however, suggests that reinforcement learning also applies to doma...
متن کاملC-trace: a New Algorithm for Reinforcement Learning of Robotic Control
There has been much recent interest in the potential of using reinforcement learning techniques for control in autonomous robotic agents. How to implement eeective reinforcement learning in a real-world robotic environment still involves many open questions. Are standard reinforcement learning algorithms like Watkins' Q-learning appropriate , or are other approaches more suit-able? Some speciic...
متن کاملMemory Approaches To Reinforcement Learning In Non-Markovian Domains
Reinforcement learning is a type of unsupervised learning for sequential decision making. Q-learning is probably the best-understood reinforcement learning algorithm. In Q-learning, the agent learns a mapping from states and actions to their utilities. An important assumption of Q-learning is the Markovian environment assumption, meaning that any information needed to determine the optimal acti...
متن کاملOn using discretized Cohen-Grossberg node dynamics for model-free actor-critic neural learning in non-Markovian domains
We describe how multi-stage non-Markovian decision problems can be solved using actor-critic reinforcement learning by assuming that a discrete version of CohenGrossberg node dynamics describes the node-activation computations of a neural network (NN). Our NN (i.e., agent) is capable of rendering the process Markovian implicitly and automatically in a totally model-free fashion without learning...
متن کامل